Inforex - a collaborative system for text corpora annotation and analysis

نویسندگان

  • Michal Marcinczuk
  • Marcin Oleksy
  • Jan Kocon
چکیده

We report a first major upgrade of Inforex — a web-based system for qualitative and collaborative text corpora annotation and analysis. Inforex is a part of Polish CLARIN infrastructure1. It is integrated with a digital repository for storing and publishing language resources2 and it allows to visualize, browse and annotate text corpora stored in the repository. As a result of a series of workshops for researchers in Humanities and Social Sciences we improved the graphical interface to make the system more friendly and readable for non-experienced users. We also implemented a new functionality for a gold standard annotation which includes private annotations and annotation agreement by a super-annotator.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inforex - a web-based tool for text corpus management and semantic annotation

The aim of this paper is to present a system for semantic text annotation called Inforex. Inforex is a web-based system designed for managing and annotating text corpora on the semantic level including annotation of Named Entities (NE), anaphora, Word Sense Disambiguation (WSD) and relations between named entities. The system also supports manual text clean-up and automatic text pre-processing ...

متن کامل

Phrase Detectives: A Web-based Collaborative Annotation Game

Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand annotators. One solution is to exploit collaborative work on the Web and one way to do this is through games like the ESP game. Applying this methodology however requires developing methods for teaching subjects the rules of the game and evaluating their contribution whil...

متن کامل

PACTE : a collaborative platform for textual annotation

In this article, we provide an overview of a web-based text annotation platform, called PACTE. We highlight the various features contributing to making PACTE an ideal platform for research projects involving textual annotation of large corpora performed by geographically distributed teams.

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

Collecting Life Logs for Experience-Based Corpora

In this paper we propose an approach to lightweight acquisition, sharing and annotation of experience-based corpora via mobile devices. Corpora acquisition is the crucial and often costly process in speech and language science and engineering. To address this problem, we have built a system for creating a location based corpora annotated with multimedia tags (e.g. text, speech, image) generated...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017